View frustum culling for free viewpoint video (fvv)

ABSTRACT

The view frustum culling technique described herein allows Free Viewpoint Video (FVV) or other 3D spatial video rendering at a client by sending only the 3D geometry and texture (e.g., RGB) data necessary for a specific viewpoint or view frustum from a server to the rendering client. The synthetic viewpoint is then rendered by the client by using the received geometry and texture data for the specific viewpoint or view frustum. In some embodiments of the view frustum culling technique, the client has both some texture data and 3D geometric data stored locally if there is sufficient local processing power. Additionally, in some embodiments, additional spatial and temporal data can be sent to the client to support changes in the view frustum by providing additional geometry and texture data that will likely be immediately used if the viewpoint is changed either spatially or temporally.

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of and the priority to a priorprovisional U.S. patent application entitled “INTERACTIVE SPATIAL VIDEO”which was assigned Ser. No. 61/653,983 and was filed May 31, 2012.

BACKGROUND

A traditional video generally includes one or more scenes, where eachscene in the video can be either relatively static (e.g., the objects inthe scene do not substantially change or move over time) or dynamic(e.g., the objects in the scene substantially change and/or move overtime). In a traditional video the viewpoint of each scene is chosen bythe director when the video is recorded or captured and this viewpointcannot be controlled or changed by an end user while they are viewingthe video. In other words, in a traditional video the viewpoint of eachscene is fixed and cannot be modified when the video is being renderedand displayed.

Free Viewpoint Video (FVV) is created from images captured by multiplecameras viewing a scene from different viewpoints. FVV generally allowsa user to look at a scene from synthetic viewpoints that are createdfrom the captured images and to navigate around the scene. Morespecifically, in FVV an end user can interactively control and changetheir viewpoint of each scene at will while they are viewing the video.In other words, in a FFV each end user can interactively generatesynthetic (i.e., virtual) viewpoints of each scene on-the-fly while thevideo is being rendered and displayed. This creates a feeling ofimmersion for any end user who is viewing a rendering of the capturedscene, thus enhancing their viewing experience.

The creation and playback of a FVV requires working with a substantialamount of data. The process of creating and playing back FVV or other 3Dspatial video typically is as follows. First, a scene is simultaneouslyrecorded from many different perspectives using sensors such as RGBcameras and other video and audio capture devices. Second, the capturedvideo data is processed to extract 3D geometric information in the formof geometric proxies using 3D Reconstruction (3DR) algorithms. Finally,the original texture data (e.g., RGB data) and geometric proxies arerecombined during rendering, for example by using Image Based rendering(IBR) algorithms, to generate synthetic viewpoints of the scene.

The amount of data may vary considerably from one FVV to another FVV dueto the differences in the number of sensors used to record the scene,the length of the FVV, the type of 3DR algorithms used to process thedata, and the type of IBR algorithm used to generate synthetic views ofthe scene.

There exists a wide variety of different combinations of both bandwidthand local processing power that can be used for viewing FVV on a client.

SUMMARY

In general, embodiments of the view frustum culling technique describedherein transfer data necessary to render a given viewpoint or viewfrustum of a FVV or other three-dimensional (3D) spatial video over anetwork, from one or more servers to a client that renders the FVV or 3Dspatial video.

In some embodiments of the view frustum culling technique only 3Dgeometry and texture data (e.g., RGB texture data) necessary forrendering a specific synthetic viewpoint or view frustum for a FVV or 3Dspatial video are transmitted from a server (or computing cloud) to aclient. The video for the synthetic viewpoint is then rendered by theclient using the received 3D geometry and texture data. One benefit ofthese embodiments of the view frustum culling technique is that only thedata necessary to render a specific viewpoint is transferred from theserver to the client. This limits the amount of bandwidth required totransfer FVV or 3D spatial video to a client,

In some embodiments of the view frustum culling technique, the clientstores some texture data and 3D geometric data locally if there issufficient local processing power. Local data at the client andsufficient processing power can lead to more fluid and seamlesstransitions as the virtual viewpoint is moved around within a FVV scene.In addition, for static or non-moving elements of the scene, 3D geometrycan be cached locally on the client, eliminating the need for redundantdata transfers.

Finally in some embodiments of the view frustum culling technique,additional spatial and temporal data can be sent to the client from theserver so that data necessary to support a desired view frustum issupplemented with additional geometry and texture data that would beimmediately used if the viewpoint was changed either spatially ortemporally.

It is noted that this Summary is provided to introduce a selection ofconcepts, in a simplified form, that are further described hereafter inthe Detailed Description. This Summary is not intended to identify keyfeatures or essential features of the claimed subject matter, nor is itintended to be used as an aid in determining the scope of the claimedsubject matter.

DESCRIPTION OF THE DRAWINGS

The specific features, aspects, and advantages of the disclosure willbecome better understood with regard to the following description,appended claims, and accompanying drawings where:

FIG. 1 depicts a high level flow diagram of an exemplary process forpracticing the view frustum culling technique described herein.

FIG. 2 depicts another flow diagram of an exemplary process forpracticing the view frustum culling technique described herein from theperspective of a server.

FIG. 3 depicts another flow diagram of an exemplary process for playingFVV content at a client according to the view frustum culling technique.

FIG. 4 depicts one exemplary embodiment of the view frustum cullingtechnique described herein wherein the geometric data and texture dataof the view frustum is divided into increasingly smaller threedimensional cells.

FIG. 5 is an exemplary architecture for practicing one exemplaryembodiment of the view frustum culling technique described herein.

FIG. 6 is a diagram illustrating a spatial three dimensional videopipeline in which the view frustum culling technique described hereincan be practiced.

FIG. 7 is a schematic of an exemplary computing environment which can beused to practice the view frustum culling technique.

DETAILED DESCRIPTION

In the following description of the view frustum culling technique,reference is made to the accompanying drawings, which form a partthereof, and which show by way of illustration examples by which theview frustum culling technique described herein may be practiced. It isto be understood that other embodiments may be utilized and structuralchanges may be made without departing from the scope of the claimedsubject matter.

1.0 Frustum Culling Technique

The following sections provide background information and an overview ofthe view frustum culling technique, as well as exemplary processes andan exemplary architecture for practicing the technique. Details ofvarious embodiments of the view frustum culling technique are alsoprovided, as is a description of an exemplary spatial video pipeline anda suitable computing environment for practicing the technique.

It is also noted that for the sake of clarity specific terminology willbe resorted to in describing the pipeline technique embodimentsdescribed herein and it is not intended for these embodiments to belimited to the specific terms so chosen. Furthermore, it is to beunderstood that each specific term includes all its technicalequivalents that operate in a broadly similar manner to achieve asimilar purpose. Reference herein to “one embodiment”, or “anotherembodiment”, or an “exemplary embodiment”, or an “alternate embodiment”,or “one implementation”, or “another implementation”, or an “exemplaryimplementation”, or an “alternate implementation” means that aparticular feature, a particular structure, or particularcharacteristics described in connection with the embodiment orimplementation can be included in at least one embodiment of thepipeline technique. The appearances of the phrases “in one embodiment”,“in another embodiment”, “in an exemplary embodiment”, “in an alternateembodiment”, “in one implementation”, “in another implementation”, “inan exemplary implementation”, and “in an alternate implementation” invarious places in the specification are not necessarily all referring tothe same embodiment or implementation, nor are separate or alternativeembodiments/implementations mutually exclusive of otherembodiments/implementations. Yet furthermore, the order of process flowrepresenting one or more embodiments or implementations of the pipelinetechnique does not inherently indicate any particular order not implyany limitations of the pipeline technique.

The term “sensor” is used herein to refer to any one of a variety ofscene-sensing devices which can be used to generate a sensor data thatrepresents a given scene. Each of the sensors can be any type of videocapture device (e.g., any type of video camera).

The term “server” is used herein to refer to one or more servercomputing devices either operating in a stand-alone server-client modeor operating in a computing cloud infrastructure so as to provide FFV or3D spatial video services to a client computer over a data communicationnetwork.

A view frustum is the region of space in a modeled world that mightappear on a screen; it is the field of view of a notional camera. Viewfrustum culling is the process of removing objects that lie completelyoutside the viewing frustum from the rendering process.

1.1 Overview of the Technique

In general, the view frustum culling technique described hereintransfers Free Viewpoint Video (FVV) from a server to a client over anetwork, such as, for example, the Internet, or over a proprietaryintranet.

The view frustum culling technique embodiments described hereingenerally involve providing a FVV that provides a consistent andmanageable amount of data to a client despite the large amounts of datatypically demanded to create and render the FVV. In one generalembodiment, this is accomplished by first capturing a scene using anarrangement of sensors. This sensor arrangement includes a plurality ofsensors that generate a plurality of streams of sensor data, where eachstream represents the scene from a different geometric perspective.These streams of sensor data are input and calibrated, and thengeometric proxies and texture data are generated from the calibratedstreams of sensor data. The geometric proxies and texture data describethe scene as a function of time. Next, a current synthetic viewpoint ofthe scene is received from a client computing device via a datacommunication network. This current synthetic viewpoint was selected byan end user of the client computing device. Once a current syntheticviewpoint is received, the geometric proxies and texture data necessaryto render the given synthetic viewpoint or view frustum are computed orselected by the server, for example, from a FVV database that storesthat type of data generated using the scene proxies. These selectedgeometric proxies and texture data that depict at least a portion of thescene as viewed from the current synthetic viewpoint of the scene aretransmitted to the client computing device via the data communicationnetwork for render at the client and to display to the end user of theclient computing device.

From the perspective of a client computing device, a FVV produced asdescribed above is played at the client in one general embodiment asfollows. A request is received from an end user to display a FVVselection user interface screen that allows the end user to select a FVVavailable for playing. This FVV selection user interface screen isdisplayed on a display device, and an end user FVV selection is input.The end user FVV selection is then transmitted to a server via a datacommunication network. The client computing device then receives aninstruction from the server via the data communication network toinstantiate end user controls appropriate for the type of FVV selected.In response, an appropriate FVV control user interface is provided tothe end user. The client computing device then monitors end user inputsvia the FVV control user interface, and whenever an end user viewpointnavigation input is received, it is transmitted to the server via thedata communication network. FVV geometric proxies and texture data torender the requested viewpoint or view frustum are then received fromthe server. These geometric proxies and texture data are rendered at theclient so as to render at least a portion of the captured scene as itwould be viewed from the last viewpoint the end user input, and isdisplayed on the aforementioned display device as it is received.

As discussed above, some embodiments of the view frustum cullingtechnique transfer only the 3D geometry data and texture data necessaryto render a specific viewpoint or view frustum from the server to theclient. The synthetic viewpoint is then rendered by the client using thereceived 3D geometry and texture data. This approach has the advantageof providing a consistent and manageable amount of data to a client, orseveral clients, because only the geometric data and texture datanecessary to display a specific viewpoint or view frustum desired by auser of the client are sent to the client.

In some embodiments of the view frustum culling technique, however, someadditional spatial and temporal data other than only that needed torender the client's requested viewpoint or view frustum can be sent tothe client from the server. In these embodiments the data necessary tosupport the view frustum is supplemented with additional geometry dataand texture data that would be immediately used if the viewpoint waschanged either spatially or temporally at the client. For example,geometry data and texture data at the edge of the view frustum for theselected viewpoint can be sent to the client.

Furthermore, in some embodiments of the view frustum culling technique,the FVV client has texture data and 3D geometric data stored locally ifthere is sufficient local processing power which can provide more fluidand seamless transitions of rendering a FVV scene as the virtualviewpoint is moved around within the scene. In addition, for static ornon-moving elements of the scene, previously received 3D geometry ortexture data can be cached locally on the client, eliminating the needfor redundant data transfers.

An overview of the view frustum culling technique having been provided,the following paragraphs will describe exemplary processes and anexemplary architecture for practicing the view frustum cullingtechnique.

1.2 Exemplary Processes

FIG. 1 depicts one exemplary computer-implemented process 100 forstreaming FVV to a client according to the view frustum cullingtechnique. As shown in FIG. 1, block 102, only texture data (e.g., RGBdata) and geometric data for a given view frustum is received at aclient from a server. Next, a given viewpoint of the spatial threedimensional video is rendered and displayed at the client using only thedownloaded texture and geometric data for the given view frustum, asshown in block 104. Texture data (e.g., RGB data) or geometric datawhich has not changed on the client does not have to be downloadedagain.

A modification to the process described above is that in addition toonly the data necessary to render a specific viewpoint or view frustum,some additional spatial or temporal data is also sent from the server toclient. Small changes in the spatial or temporal navigation areanticipated and the data is sent to the client prior to rendering. Forexample, additional texture data and corresponding geometric data at theedges of the client's requested viewpoint or view frustum is sent to theclient in addition to the 3D geometry and texture data necessary torender the viewpoint requested by the client. More specifically, given acurrent viewpoint a user's view of a scene will include a correspondingview frustum for which geometry data and texture data is sent. However,if the time it takes to send this data from the server to the client isknown, how far a client's position and viewpoint can change in this timecan be computed. Hence it is possible to send the additional geometryand texture data corresponding to the maximum distance the user can movein the time it takes to send data from the server to the client.Additionally, additional geometry and texture data can be sent to clientbased on a predicted viewpoint based on the client's rate of viewpointchange. This predicted viewpoint can be calculated, for example, bycomputing a maximum bounding volume that will contain the user'sviewpoint based on the velocity the user is moving and the time it takesto transmit geometry data and texture data to the client. Additionally,a lower level of detail of geometric data can be sent to the client forviewpoints that the client has a lower probability of reaching. Forexample, if the user's velocity (V) and the time it takes to send datafrom the server to the client (t) is known, one can compute that themost the user can move is P′=P+tV, where P is their current location andP′ is the furthest they can move in time t. Furthermore, a user is lesslikely to see an object if they need to move the entire allowabledistance for it to come into view, which means that a lower level ofdetail can be sent for the object. Similarly, a lower level of detail oftexture data and geometric data can be sent for objects in the distanceof the client's view frustum. Yet another variation of the processdescribed above includes provisions for reducing detail based on theangular velocity of the camera required to bring objects into view,i.e., objects that are further away angularly will translate into fastercamera motion, thus the rendering will be more motion blurred and lessdetail need be rendered.

FIG. 2 depicts another exemplary computer-implemented process 200 forsending a FVV from one or more servers to a client according to the viewfrustum culling technique. In the embodiment shown in FIG. 2, a scene iscaptured using an arrangement of sensors (block 202). This sensorarrangement includes a plurality of sensors that generate a plurality ofstreams of sensor data, where each stream represents the scene from adifferent geometric perspective. These streams of sensor data are inputand calibrated (block 204), and then scene geometric data and texturedata are generated via conventional means from the calibrated streams ofsensor data and are stored at the server (block 206). The geometric dataand texture data describe the scene as a function of time. Next, acurrent synthetic viewpoint of the scene or its associated view frustumis received from a client computing device via a data communicationnetwork (block 208). This current synthetic viewpoint can be accompaniedby the client's display characteristics if it is necessary to computethe view frustum for the current synthetic viewpoint. It is noted thatthis current synthetic viewpoint was selected by an end user of theclient computing device. Once a current synthetic viewpoint is received,the geometric data and texture data to render the given syntheticviewpoint or view frustum are retrieved from the location where theywere stored (e.g., from a database) at the server (block 210) and aretransmitted to the client computing device via the data communicationnetwork for render and display to the end user of the client computingdevice (block 212).

FIG. 3 depicts another exemplary computer-implemented process 300 forplaying FVV content at a client according to the view frustum cullingtechnique. As shown in block 302, a user installs a FVV player on alocal client. The user selects and requests a desired FVV stored on aserver, as shown in block 304. The client receives a message from theserver that tells the client to instantiate a FVV player with controlsappropriate to the FVV type of the desired FVV, as shown in block 306,and the client instantiates the FVV player, as shown in block 308. Theclient then requests a desired view point or view frustum from theserver, and if necessary sends the client's display characteristics ifit is necessary for the server to calculate the client's view frustum,as shown in block 310. The server renders the desired viewpoint for thedesired FVV, and sends the client only the 3D geometry data and texturedata (e.g., RGB data) necessary to render the client's viewpoint/viewfrustum of the desired FVV, as shown in block 312. The client combinesthe 3D geometry data and texture data to render the desiredviewpoint/view frustum at the client, as shown in block 314. The clientthen checks for user viewpoint navigation input and if there is any theclient sends navigation input (e.g. a request for a new viewpoint) tothe server (block 316). The server can then render a viewpoint of theFVV based on the received navigation input and send the geometry dataand texture data needed for the client to render the FVV for the newviewpoint which is received at the client, as shown in block 318, andblocks 310 through 318 can be repeated. For example, to changeviewpoints, a new (typically user specified) viewpoint is sent from theclient to the server, and a new FVV or other 3D spatial video isinitiated from the new viewpoint at the server. The 3D geometry andtexture data associated with the new viewpoint are retrieved, the FVV isrendered at the server and the necessary 3D geometry and texture datanecessary to render the FVV or 3D spatial video for the viewpoint orviewpoint view frustum requested by the client that renders the FVV istransmitted to the client until a new viewpoint request is received.

As described with respect to FIG. 1, a modification to the exemplaryprocess described in FIG. 3, is that in addition to only the datanecessary to render a specific viewpoint or view frustum, someadditional texture data and corresponding geometric data at the edges ofthe view frustum is sent to the client in addition to the 3D geometryand texture data necessary to render the viewpoint requested by theclient. As discussed above with respect to FIG. 1, the client'sviewpoint can be predicted based on the client's rate of viewpointchange; a lower level of detail of geometric data can be sent to theclient for viewpoints that the client has a lower probability ofreaching; and a lower level of detail of texture data and geometric datacan be sent for objects in the distance of the client's view frustum.

In some embodiments of the technique, the geometric data and texturedata is stored as a spatial representation of all viewpoints possible.For example, the spatial representation of all viewpoints possible canbe defined by three dimensional cells as shown in FIG. 4. A large threedimensional cell 402 can be sub-divided into smaller three dimensionalcells 404 and these smaller three dimensional cells can further besub-divided into even smaller three dimensional cells 406. The servercan store the geometric data and texture data of the FVV in theincreasingly sub-divided three dimensional cells and the client canrequest specific cells corresponding to a desired viewpoint or viewfrustum to be rendered. Alternately, the server can compute the cells tosend to the client based on a viewpoint received from the client thatthe client wishes to render. In any of these embodiments, the threedimensional cells can be stored in a compressed format. The cells canalso be used to provide the level of detail of texture data or geometricdata desired. It should be noted that any spatial data structure can beused to represent the three dimensional cells discussed above. Forexample, an octree, a kd-tree or a bounding volume hierarchy structurecould be used.

Exemplary processes for practicing the view frustum culling techniquehaving been described, the following section discusses an exemplaryarchitecture for practicing the technique.

1.4 Exemplary Architecture

FIG. 5 shows an exemplary architecture 500 for practicing one embodimentof the view frustum culling technique. As shown in FIG. 5, thisexemplary architecture 500 includes a server 502, that can be a generalpurpose computing device 700, which will be discussed in greater detailwith respect to FIG. 7. The server 502 includes a database 504 ofFVV/spatial 3D videos 506. For each of the videos 506, the database 504includes the texture data and geometric data for rendering all of thesynthetic viewpoints of each of the FVVs. The geometric data and texturedata stored in the database 504 may have been previously calculated atthe server via conventional means. Only the texture data and geometricdata necessary to render a desired viewpoint or view frustum at theclient is sent to the client. If the client 508 only provides a givenviewpoint, the server 502 can compute the client's view frustum in aview frustum computation module 510. Likewise, the client can computethe client's view frustum in a view frustum computation module 512 onthe client. The server 502 can determine which geometric data andtexture data to send to the client by rendering the desired FVV for thedesired viewpoint in a 3D renderer 514.

The client 508 includes a FVV or spatial video player 516 which can beused to view and navigate through a FVV or other 3D spatial video. Theclient 508 also includes a user interface 518 that includes a displayand that allows a user 520 of the client 508 to input user data such as,for example, the particular video 506 that the user would like tointeract with, the viewpoint or view frustum the user would like toview, changes in the viewpoint, and so forth. The client 508 also has a3D renderer 522 that can render the given viewpoint of the desired freeviewpoint video 506 at the client 508 using the downloaded texture andgeometric data for the desired viewpoint. The client 508 can alsoinclude a data store 524 that can store various data, such as, forexample, geometric and texture data previously sent to the client 508from the server 502, so that the data does not have to be retransmittedfrom the server once it has been sent. Furthermore, the client 508 canalso include a viewpoint predictor 526 that predicts a viewpoint in thefree viewpoint video based on viewpoint navigation changes requested bythe client or computed using a rate of change of the viewpoint that theclient is viewing. If the client does not compute the predictedviewpoint, the server can also employ a viewpoint prediction module 528to compute the predicted viewpoint based on the viewpoint navigationupdates. Additionally, the client can employ a level of detailcomputation module 530 that can compute the level of detail for an imageor geometric data best suited to display far away objects or otherobjects that can be displayed with less detail in the free viewpointvideo. Likewise, the server can also have a level of detail computationmodule 532 that can compute the level of detail for an image orgeometric data best suited to display objects that can be rendered withless detail in the free viewpoint video.

In one embodiment of the view frustum culling technique the architecture500 could be used in the following manner to render a free viewpointvideo at a client 508. The client 508 sends a request 534 for a specificfree viewpoint video to the server 502. The server 502 then sends acommand 536 to instantiate the FVV player 516 for the chosen video tothe client 508. The client 508 instantiates the FVV player 516 and sendsa request 538 for a current viewpoint of the FVV. The server 502 thensends the geometry and texture data necessary to render only the currentviewpoint of the chosen FVV 540. The client 508 then renders the desiredviewpoint of the desired FVV at the client using the received geometryand texture data. The client 508 can then send an updated desiredviewpoint or rate of change of the viewpoint 542 to the server 502, andin return the server 502 can send the geometry and texture data torender the desired updated viewpoint or a predicted viewpoint based onthe viewpoint rate of change 544.

As discussed previously, some embodiments of the view frustum cullingtechnique send, in addition to only the data necessary to render aspecific viewpoint or view frustum, some additional spatial or temporaldata from the server to client. Small changes in the spatial or temporalnavigation are anticipated and the geometric and texture data is sent tothe client prior to rendering. For example, additional texture data andcorresponding geometric data at the edges of the client's requestedviewpoint or view frustum is sent to the client in addition to the 3Dgeometry and texture data necessary to render the viewpoint requested bythe client. In this case the client's viewpoint can be predicted basedon the client's rate of viewpoint change in a viewpoint predictionmodule 528 on the server or in a viewpoint prediction module 526 on theclient. Additionally, a lower level of detail of geometric data can becomputed in a level of detail computation module 532 and can be sent tothe client for viewpoints that the client has a lower probability ofreaching. Similarly, a lower level of detail of texture data andgeometric data can be sent for objects in the distance of the client'sview frustum. In one case a client may request a certain level of detailof geometric and/or texture data from the server and in this case theclient may determine the level of detail desired in a level of detailcomputation module 530 on the client.

1.5 Exemplary Spatial Video Pipeline

The view frustum culling technique described herein can be used invarious scenarios. One way the technique can be used is in a system forgenerating Spatial Video (SV). The following paragraphs provide detailsof a spatial video pipeline in which the view frustum culling techniquedescribed herein can be used. The details of image capture, processing,storage and streaming, rendering and the user experience discussed withrespect to this exemplary spatial video pipeline can apply to varioussimilar processing actions discussed with respect to the exemplaryprocesses and the exemplary architecture of the view frustum cullingtechnique discussed above.

Spatial Video (SV) provides a next generation, interactive, andimmersive video experiences relevant to both consumer entertainment andtelepresence, leveraging applied technologies from Free Viewpoint Video(FVV). As such, SV encompasses a commercially viable system thatsupports features required for capturing, processing, distributing, andviewing any type of FVV media in a number of different productconfigurations.

It is noted, however, that view frustum culling technique embodimentsdescribed herein are not limited to only the exemplary FVV pipeline tobe described. Rather, other FFV pipelines can also be employed to createand render video, as desired.

1.5.1 Spatial Video Pipeline

SV requires an end to end processing and playback pipeline for any typeof FVV that can be captured. Such a pipeline 600 is shown in FIG. 6, theprimary components of which include: Capture 602; Process 604;Storage/Streaming 606; Render 608; and the User Experience 610.

The SV Capture 602 stage of the pipeline supports any hardware used inan array to record a FVV scene. This includes the use of variousdifferent kinds of sensors (including video cameras and audio) forrecording data. When sensors are arranged in 3D space relative to ascene, their type, position, and orientation is referred to as thecamera geometry. The SV pipeline generates the calibrated camerageometry for static arrays of sensors as well as for moving sensors atevery point in time during the capture of a FVV. The SV pipeline isdesigned to work with any type of sensor data from any kind of an array,including, but not limited to RGB data from traditional cameras(including the use of structured light such a with Microsoft®Corporation's Kinect™), monochromatic cameras, or time of flight (TOF)sensors that generate depth maps and RGB data directly. The SV pipelineis able to determine the intrinsic and extrinsic characteristics of anysensor in the array at any point in time. Intrinsic parameters such asthe focal length, principal point, skew coefficient, and distortions arerequired to understand the governing physics and optics of a givensensor. Extrinsic parameters include both rotations and translationswhich detail the spatial location of the sensor as well as the directionthe sensor is pointing. Typically, a calibration setup procedure iscarried out that is specific to the type, number and placement ofsensors. This data is often recorded in one or more calibrationprocedures prior to recording a specific FVV. If so, this data isimported into the SV pipeline in addition to any data recorded with thesensor array.

Variability associated with the FVV scene as well as playback navigationmay impact how many sensors are used to record the scene as well aswhich type of sensors are selected and their positioning. SV typicallyincludes at minimum one RGB sensor as well as one or more sensors thatcan be used in combination to generate 3D geometry describing a scene.Outdoor and long distance recording favors both wide baseline and narrowbaseline RGB stereo sensor pairs. Indoor conditions favor narrowbaseline stereo IR using structured light avoiding the dependency uponlighting variables. As the scene becomes more complex, for example asadditional people are added, the use of additional sensors reduces thenumber of occluded areas within the scene—more complex scenes requirebetter sensor coverage. Moreover, it is possible to capture both anentire scene at one sensor density and then to capture a secondary,higher resolution volume at the same time, with additional moveablesensors targeting the secondary higher resolution area of the scene. Asmore sensors are used to reduce occlusion artifacts in the array,additional combinations of the sensors can also be used in processingsuch as when a specific sensor is part of both a narrow baseline stereopair as well as a different wide baseline stereo pair involving a thirdsensor.

The SV pipeline is designed to support any combination of sensors in anycombination of positions.

The SV Process 604 stage of the pipeline takes sensor data and extracts3D geometric information that describes the recorded scene bothspatially and temporally. Different types of 3DR algorithms are useddepending on: the number and type of sensors, the input camera geometry,and whether processing is done in real time or asynchronously from theplayback process. The output of the process stage is various geometricproxies which describe the scene as a function of time. Unlike videogames or special effects technology, 3D geometry in the SV pipeline iscreated using automated computer vision 3DR algorithms with no humaninput required.

SV Storage and Streaming 606 methods are specific to different FVVproduct configurations, and these can be segmented as: bidirectionallive applications of FVV in telepresence, broadcast live applications ofFVV, and asynchronous applications of FVV. Depending on detailsassociated with these various product configurations, data is processed,stored, and distributed to end users in different manners.

The SV pipeline uses 3D reconstruction to process calibrated sensor datato create geometric proxies describing the FVV scene. The SV pipelineuses various 3D reconstruction approaches depending upon the type ofsensors used to record the scene, the number of sensors, the positioningof the sensors relative to the scene, and how rapidly the scene needs tobe reconstructed. 3D geometric proxies generated in this stage includesdepth maps, point based renderings, or higher order geometric forms suchas planes, objects, billboards, models, or other high fidelity proxiessuch as mesh based representations. The SV Render 608 stage is based onimage based rendering (IBR), since synthetic, or virtual, viewpoints ofthe scene are created using real images and different types of 3Dgeometry. SV render 608 uses different IBR algorithms to rendersynthetic viewpoints based on variables associated with the productconfiguration, hardware platform, scene complexity, end user experience,input camera geometry, and the desired degree of viewpoint navigation inthe final FVV. Therefore, different IBR algorithms are used in the SVRendering stage to maximize photorealism from any necessary syntheticviewpoints during end user playback of a FVV.

When the SV pipeline is used in real time applications, sensor data mustbe captured, processed, transmitted, and rendered in less than onethirtieth of a second. Because of this constraint, the types of 3Dreconstruction algorithms that can be used are limited to highperformance algorithms. Primarily, 3D reconstruction that is used realtime includes point cloud based depictions of a scene or simplifiedproxies such as billboards or prior models which are either modified oranimated. The use of active IR or structured light can assist ingenerating point clouds in real time since the pattern is known ahead oftime. Algorithms that can be implemented in hardware are also favored.

Asynchronous 3D reconstruction removes the constraint of time fromprocessing a FVV. This means that point based reconstructions of thescene can be used to generate higher fidelity geometric proxies, such aswhen point clouds are used as an input to create a geometric meshdescribing surface geometry. The SV pipeline also allows multiple 3Dreconstruction steps to be used when creating the most accurategeometric proxies describing the scene. For example, if a point cloudrepresentation of the scene has been reconstructed, there may be somenoisy or error prone stereo matches present that extend the boundary ofthe human silhouette, leading to the wrong textures appearing on a meshsurface. To remove these artifacts, the SV pipeline runs a segmentationprocess to separate the foreground from the background, so that pointsoutside of the silhouette are rejected as outliers.

In another example of 3D reconstruction, a FVV is created with eightgenlocked devices from a circular camera geometry each device consistingof: 1 IR randomized structured light projector, 2 IR cameras, and 1 RGBcamera. Firstly, IR images are used to generate a depth map. Multipledepth maps and RGB images from different devices are used to create a 3Dpoint cloud. Multiple point clouds are combined and meshed. Finally, RGBimage data is mapped to the geometric mesh in the final result, using aview dependent texture mapping approach which accurately representsspecular textures such as skin.

The SV User Experience 610 processes data so that navigation is possiblewith up to 6 degrees of freedom (DOF) during FVV playback. In non-liveapplications, temporal navigation is possible as well—this isspatiotemporal (or space-time) navigation. Viewpoint navigation meansusers can change their viewpoint (what is seen on a display interface)in real time, relative to moving video. In this way, the video viewpointcan be continuously controlled or updated during playback of a FVVscene.

2.0 Exemplary Operating Environments:

The view frustum culling technique described herein is operationalwithin numerous types of general purpose or special purpose computingsystem environments or configurations. FIG. 7 illustrates a simplifiedexample of a general-purpose computer system on which variousembodiments and elements of the view frustum culling technique, asdescribed herein, may be implemented. It should be noted that any boxesthat are represented by broken or dashed lines in FIG. 7 representalternate embodiments of the simplified computing device, and that anyor all of these alternate embodiments, as described below, may be usedin combination with other alternate embodiments that are describedthroughout this document.

For example, FIG. 7 shows a general system diagram showing a simplifiedcomputing device 700. Such computing devices can be typically be foundin devices having at least some minimum computational capability,including, but not limited to, personal computers, server computers,hand-held computing devices, laptop or mobile computers, communicationsdevices such as cell phones and PDA's, multiprocessor systems,microprocessor-based systems, set top boxes, programmable consumerelectronics, network PCs, minicomputers, mainframe computers, audio orvideo media players, etc.

To allow a device to implement the view frustum culling technique, thedevice should have a sufficient computational capability and systemmemory to enable basic computational operations. In particular, asillustrated by FIG. 7, the computational capability is generallyillustrated by one or more processing unit(s) 710, and may also includeone or more GPUs 715, either or both in communication with system memory720. Note that that the processing unit(s) 710 of the general computingdevice may be specialized microprocessors, such as a DSP, a VLIW, orother micro-controller, or can be conventional CPUs having one or moreprocessing cores, including specialized GPU-based cores in a multi-coreCPU.

In addition, the simplified computing device of FIG. 7 may also includeother components, such as, for example, a communications interface 730.The simplified computing device of FIG. 7 may also include one or moreconventional computer input devices 740 (e.g., pointing devices,keyboards, audio input devices, video input devices, haptic inputdevices, devices for receiving wired or wireless data transmissions,etc.). The simplified computing device of FIG. 7 may also include otheroptional components, such as, for example, one or more conventionalcomputer output devices 750 (e.g., display device(s) 755, audio outputdevices, video output devices, devices for transmitting wired orwireless data transmissions, etc.). Note that typical communicationsinterfaces 730, input devices 740, output devices 750, and storagedevices 760 for general-purpose computers are well known to thoseskilled in the art, and will not be described in detail herein.

The simplified computing device of FIG. 7 may also include a variety ofcomputer readable media. Computer readable media can be any availablemedia that can be accessed by computer 700 via storage devices 760 andincludes both volatile and nonvolatile media that is either removable770 and/or non-removable 680, for storage of information such ascomputer-readable or computer-executable instructions, data structures,program modules, or other data. By way of example, and not limitation,computer readable media may comprise computer storage media andcommunication media. Computer storage media includes, but is not limitedto, computer or machine readable media or storage devices such as DVD's,CD's, floppy disks, tape drives, hard drives, optical drives, solidstate memory devices, RAM, ROM, EEPROM, flash memory or other memorytechnology, magnetic cassettes, magnetic tapes, magnetic disk storage,or other magnetic storage devices, or any other device which can be usedto store the desired information and which can be accessed by one ormore computing devices.

Storage of information such as computer-readable or computer-executableinstructions, data structures, program modules, etc., can also beaccomplished by using any of a variety of the aforementionedcommunication media to encode one or more modulated data signals orcarrier waves, or other transport mechanisms or communicationsprotocols, and includes any wired or wireless information deliverymechanism. Note that the terms “modulated data signal” or “carrier wave”generally refer a signal that has one or more of its characteristics setor changed in such a manner as to encode information in the signal. Forexample, communication media includes wired media such as a wirednetwork or direct-wired connection carrying one or more modulated datasignals, and wireless media such as acoustic, RF, infrared, laser, andother wireless media for transmitting and/or receiving one or moremodulated data signals or carrier waves. Combinations of the any of theabove should also be included within the scope of communication media.

Further, software, programs, and/or computer program products embodyingthe some or all of the various embodiments of the view frustum cullingtechnique described herein, or portions thereof, may be stored,received, transmitted, or read from any desired combination of computeror machine readable media or storage devices and communication media inthe form of computer executable instructions or other data structures.

Finally, the view frustum culling technique described herein may befurther described in the general context of computer-executableinstructions, such as program modules, being executed by a computingdevice. Generally, program modules include routines, programs, objects,components, data structures, etc., that perform particular tasks orimplement particular abstract data types. The embodiments describedherein may also be practiced in distributed computing environments wheretasks are performed by one or more remote processing devices, or withina cloud of one or more devices, that are linked through one or morecommunications networks. In a distributed computing environment, programmodules may be located in both local and remote computer storage mediaincluding media storage devices. Still further, the aforementionedinstructions may be implemented, in part or in whole, as hardware logiccircuits, which may or may not include a processor.

It should also be noted that any or all of the aforementioned alternateembodiments described herein may be used in any combination desired toform additional hybrid embodiments. Although the subject matter has beendescribed in language specific to structural features and/ormethodological acts, it is to be understood that the subject matterdefined in the appended claims is not necessarily limited to thespecific features or acts described above. The specific features andacts described above are disclosed as example forms of implementing theclaims.

What is claimed is:
 1. A computer-implemented process for receivingspatial three dimensional video, comprising: using a client computingdevice for: receiving only texture data and geometric data for a givenview frustum of a spatial three dimensional video from a server at aclient; rendering the given viewpoint of the spatial three dimensionalvideo at the client using the downloaded texture and geometric data forthe given view frustum.
 2. The computer-implemented process of claim 1wherein the client specifies the given view frustum to the server beforethe texture data and geometric data are downloaded to the client.
 3. Thecomputer-implemented process of claim 1 wherein the client receivestexture data and geometric data computed by the server based on aviewpoint received from the client.
 4. The computer-implemented processof claim 1, further comprising: checking if texture data or geometricdata has been previously downloaded to the client; and not downloadingthe texture data or the geometric data which has previously downloadedto the client again.
 5. The computer-implemented process of claim 1wherein additional texture data and corresponding geometric data at theedges of the view frustum is received at the client.
 6. Thecomputer-implemented process of claim 1 wherein the client's viewpointis predicted based on the client's rate of viewpoint change.
 7. Thecomputer-implemented process of claim 6 wherein the view frustum isexpanded based on the client's predicted viewpoint.
 8. Thecomputer-implemented process of claim 6 wherein a lower level of detailof geometric data is received at the client for viewpoints that theclient has a lower probability of reaching.
 9. The computer-implementedprocess of claim 1 wherein a lower level of detail of texture data andgeometric data is sent for objects in the distance of the client's viewfrustum.
 10. The computer-implemented process of claim 1 wherein thegeometric data is stored as a spatial representation of all viewpointspossible.
 11. The computer-implemented process of claim 1 wherein thespatial representation of all viewpoints possible is defined by threedimensional cells.
 12. The computer-implemented process of claim 11wherein the server stores the cells and wherein the client requestsspecific cells corresponding to a desired view point to be rendered. 13.The computer-implemented process of claim 11 wherein the server computesthe cells to send to the client based on a viewpoint the client wishesto render.
 14. The computer-implemented process of claim 11 wherein thethree dimensional cells are in a compressed format.
 15. Acomputer-implemented process for receiving free viewpoint video,comprising: using a client computing device for: installing a freeviewpoint video player on a local client; selecting a free viewpointvideo stored on a server; receiving a message from the server that tellsthe client to instantiate the free viewpoint video player with controlsappropriate to the selected free viewpoint video type; instantiating thefree viewpoint video player with controls appropriate to the selectedfree viewpoint video type; requesting a desired viewpoint of theselected free viewpoint video from the server; receiving only thenecessary geometric and texture data to render the desired viewpoint ofthe selected viewpoint video; and combining the received geometric andtexture data to render the desired viewpoint of the free viewpoint video16. The computer-implemented process of claim 15 further comprising: theclient checking for user viewpoint navigation input; and if there is anyuser viewpoint navigation input, the client sending the navigation inputto the server.
 17. The computer-implemented process of claim 16 whereinthe server uses the client's navigation input to determine which 3Dgeometry and texture data to next send to the client.
 18. A system forproviding free viewpoint video, comprising: a general purpose computingdevice; a computer program comprising program modules executable by thegeneral purpose computing device, wherein the computing device isdirected by the program modules of the computer program to, downloadonly texture data and geometric data relevant to a given viewpoint of afree viewpoint video at a client; render the given viewpoint of the freeviewpoint video at the client using only the downloaded texture andgeometric data for the given viewpoint.
 19. The system of claim 18wherein the downloaded texture data and the downloaded geometric data isdownloaded from more than one server in a computing cloud.
 20. Thesystem of claim 17 wherein the downloaded texture data and geometricdata is slightly greater than required to render the given viewpoint.